fix PLE network bug & sort file list for ps trainer #932

liangzhenduo · 2023-06-08T10:06:32Z

i表示第几个task，需要乘上每个task的expert数量，而不是乘task数量

i是第几个task，需要乘上每个task的expert数量而不是task数量

CLAassistant · 2023-06-08T12:14:04Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

fixed code style

在不同节点上的文件顺序可能不一致，split_file_list可能读到相同的文件。sort以后保证每个节点的文件列表顺序一致，拆分读取个节点不会读到重复文件。

liangzhenduo · 2024-04-15T08:36:15Z

更改reader是因为分布式训练时不同节点上的文件顺序可能不一致（都是无序状态），split_file_list后不同节点可能读到相同的文件。sort以后保证每个节点的文件列表顺序一致，拆分读取各节点不会读到重复文件。

dachr8 · 2024-09-04T09:55:14Z

pr 是正确的，同时 task_init 和 exp_init 的部分也有问题

dachr8 · 2024-09-04T09:54:21Z

models/multitask/ple/net.py

        for i in range(0, self.task_num):
            for j in range(0, self.exp_per_task):
-                linear_out = self._param_expert[i * self.task_num + j](
+                linear_out = self._param_expert[i * self.exp_per_task + j](


Update net.py fix a network bug

a122043

i是第几个task，需要乘上每个task的expert数量而不是task数量

liangzhenduo changed the title ~~Update net.py fix a network bug~~ Update net.py fix PLE network bug Jun 8, 2023

liangzhenduo added 5 commits June 12, 2023 14:27

Update net.py

aa3ba7d

fixed code style

Merge branch 'master' into master

12616d3

Merge branch 'master' into master

4cae440

Merge branch 'PaddlePaddle:master' into master

a6d68d6

Update reader_helper.py

4b2f898

在不同节点上的文件顺序可能不一致，split_file_list可能读到相同的文件。sort以后保证每个节点的文件列表顺序一致，拆分读取个节点不会读到重复文件。

liangzhenduo changed the title ~~Update net.py fix PLE network bug~~ fix PLE network bug & sort file list for ps trainer Apr 15, 2024

dachr8 approved these changes Sep 4, 2024

View reviewed changes

Merge branch 'master' into master

d7da45f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix PLE network bug & sort file list for ps trainer #932

fix PLE network bug & sort file list for ps trainer #932

Uh oh!

liangzhenduo commented Jun 8, 2023

Uh oh!

CLAassistant commented Jun 8, 2023 •

edited

Loading

Uh oh!

liangzhenduo commented Apr 15, 2024

Uh oh!

dachr8 commented Sep 4, 2024

Uh oh!

dachr8 Sep 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix PLE network bug & sort file list for ps trainer #932

Are you sure you want to change the base?

fix PLE network bug & sort file list for ps trainer #932

Uh oh!

Conversation

liangzhenduo commented Jun 8, 2023

Uh oh!

CLAassistant commented Jun 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

liangzhenduo commented Apr 15, 2024

Uh oh!

dachr8 commented Sep 4, 2024

Uh oh!

dachr8 Sep 4, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CLAassistant commented Jun 8, 2023 •

edited

Loading